Representative Protein Sequence and Structure Database

نویسندگان

  • P. J. Magesh
  • Kasturi brindha
چکیده

The database provides the information about the non-redundant protein dataset (1573 proteins) obtained from the Protein Data Bank. The information includes PDB ID, Length of the protein, Resolution, PDB Secondary structure, PDB secondary structure summary, PHD secondary structure prediction, PHD secondary structure prediction summary, sequence. We further revised the PDB Secondary structure summary by reducing the eight state assignments to three states. The database provides additional information about the revised PDB secondary structure summary and revised PHD secondary structure prediction summary, often useful for further improvement of prediction methods. The proteins with unknown 3D structure and function were searched against the RPSSD and found similarity in their secondary structure patterns with the proteins of known structure in our database. This secondary structure information of proteins is furthermore useful in recognizing novel folds in proteins with unknown 3D structure. We are in a process of developing an interactive web interface with the non-redundant protein dataset to enhance the prediction methods and to make this approach as a novel fold recognition method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences

The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...

متن کامل

Structural Characteristics of Stable Folding Intermediates of Yeast Iso-1-Cytochrome-c

Cytochrome-c (cyt-c) is an electron transport protein, and it is present throughout the evolution. More than 280 sequences have been reported in the protein sequence database (www.uniprot.org). Though sequentially diverse, cyt-c has essentially retained its tertiary structure or fold. Thus a vast data set of varied sequences with retention of similar structure and fun...

متن کامل

In Silico Prediction and Docking of Tertiary Structure of Multifunctional Protein X of Hepatitis B Virus

Hepatitis B virus (HBV) infection is a universal health problem and may result into acute, fulminant, chronic hepatitis liver cirrhosis, or hepatocellular carcinoma. Sequence for protein X of HBV was retrieved from Uniprot database. ProtParam from ExPAsy server was used to investigate the physicochemical properties of the protein. Homology modeling was carried out using Phyre2 server, and refin...

متن کامل

iProsite: an improved prosite database achieved by replacing ambiguous positions with more informative representations

PROSITE database contains a set of entries corresponding to protein families, which are used to identify the family of a protein from its sequence. Although patterns and profiles are developed to be very selective, each may have false positive or negative hits. Considering false positives as items that reduce the selectiveness of a pattern, then, the more selective pattern we have, a more accur...

متن کامل

The FSSP database: fold classification based on structure-structure alignment of proteins

The FSSP database presents a continuously updated classification of 3-D protein folds based on an all-against-all comparison of structures currently in the Protein Data Bank (PDB) [Bernstein et al. (1977) J. Mol. Biol., 112, 535- 542]. The database currently contains an extended structural family for each of 600 representative protein chains which have <25% mutual sequence identity. The results...

متن کامل

3PFDB+: improved search protocol and update for the identification of representatives of protein sequence domain families

Protein domain families are usually classified on the basis of similarity of amino acid sequences. Selection of a single representative sequence for each family provides targets for structure determination or modeling and also enables fast sequence searches to associate new members to a family. Such a selection could be challenging since some of these domain families exhibit huge variation depe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012